word level
- North America > United States > Virginia (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring
Su, Zhixiong, Wang, Yichen, Wan, Herun, Zhang, Zhaohan, Luo, Minnan
The misuse of large language models (LLMs) poses potential risks, motivating the development of machine-generated text (MGT) detection. Existing literature primarily concentrates on binary, document-level detection, thereby neglecting texts that are composed jointly by human and LLM contributions. Hence, this paper explores the possibility of fine-grained MGT detection under human-AI coauthoring. We suggest fine-grained detectors can pave pathways toward coauthored text detection with a numeric AI ratio. Specifically, we propose a dataset, HACo-Det, which produces human-AI coauthored texts via an automatic pipeline with word-level attribution labels. We retrofit seven prevailing document-level detectors to generalize them to word-level detection. Then we evaluate these detectors on HACo-Det on both word- and sentence-level detection tasks. Empirical results show that metric-based methods struggle to conduct fine-grained detection with a 0.462 average F1 score, while finetuned models show superior performance and better generalization across domains. However, we argue that fine-grained co-authored text detection is far from solved. We further analyze factors influencing performance, e.g., context window, and highlight the limitations of current methods, pointing to potential avenues for improvement.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
CUTE: Measuring LLMs' Understanding of Their Tokens
Edman, Lukas, Schmid, Helmut, Fraser, Alexander
Large Language Models (LLMs) show remarkable performance on a wide variety of tasks. Most LLMs split text into multi-character tokens and process them as atomic units without direct access to individual characters. This raises the question: To what extent can LLMs learn orthographic information? To answer this, we propose a new benchmark, CUTE, which features a collection of tasks designed to test the orthographic knowledge of LLMs. We evaluate popular LLMs on CUTE, finding that most of them seem to know the spelling of their tokens, yet fail to use this information effectively to manipulate text, calling into question how much of this knowledge is generalizable.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
Hierarchical Question-Image Co-Attention for Visual Question Answering Jiasen Lu
A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention.
- North America > United States > Virginia (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
nerblackbox: A High-level Library for Named Entity Recognition in Python
We present nerblackbox, a python library to facilitate the use of state-of-the-art transformer-based models for named entity recognition. It provides simple-to-use yet powerful methods to access data and models from a wide range of sources, for fully automated model training and evaluation as well as versatile model inference. While many technical challenges are solved and hidden from the user by default, nerblackbox also offers fine-grained control and a rich set of customizable features. It is thus targeted both at application-oriented developers as well as machine learning experts and researchers.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > Sweden (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Problems of Non-equivalent Words in Technical Translation
Translating words which do not have equivalent in target language is not easy and finding proper equivalent of those words are very important to render correctly and understandably, the article defines some thoughts and ideas of scientists on the common problems of non-equivalent words from English to Russian language and includes English and Russian examples and ideas of certain scientist. The English language is worldwide spoken and there are 1.35 billion English speakers and over 258 million Russian speakers according to the 2021s statistics. Inevitably, these billions of speakers around the world have connection and they may have deal in different criteria. In order to understand one another they need to have a pure and fully-understood language. These pure languages understanding directly relates to translation knowledge where linguists and translators need to work and research to eradicate misunderstanding. Misunderstandings mostly appear in non-equivalent words because there are different local and internal words like food, garment, cultural and traditional words and others in every notion. Truly, most of these words do not have equivalent in the target language and these words need to be worked and find their equivalent in the target language to fully understand the both languages. However, some of these non-equivalent words are already professionally rendered to the target language but still there many other words to be rendered. Hence, this research paper includes different ways and rules of rendering non-equivalent words from source language to the target language.
- Africa > South Africa > Gauteng > Pretoria (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Russia > Central Federal District > Voronezh Oblast > Voronezh (0.04)
- (2 more...)
CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization
Xiao, Min, Zhu, Junnan, Lin, Haitao, Zhou, Yu, Zong, Chengqing
Multimodal summarization usually suffers from the problem that the contribution of the visual modality is unclear. Existing multimodal summarization approaches focus on designing the fusion methods of different modalities, while ignoring the adaptive conditions under which visual modalities are useful. Therefore, we propose a novel Coarse-to-Fine contribution network for multimodal Summarization (CFSum) to consider different contributions of images for summarization. First, to eliminate the interference of useless images, we propose a pre-filter module to abandon useless images. Second, to make accurate use of useful images, we propose two levels of visual complement modules, word level and phrase level. Specifically, image contributions are calculated and are adopted to guide the attention of both textual and visual modalities. Experimental results have shown that CFSum significantly outperforms multiple strong baselines on the standard benchmark. Furthermore, the analysis verifies that useful images can even help generate non-visual words which are implicitly represented in the image.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (6 more...)
Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding
Sankar, Sanjana, Beautemps, Denis, Elisei, Frédéric, Perrotin, Olivier, Hueber, Thomas
Hard of hearing or profoundly deaf people make use of cued speech (CS) as a communication tool to understand spoken language. By delivering cues that are relevant to the phonetic information, CS offers a way to enhance lipreading. In literature, there have been several studies on the dynamics between the hand and the lips in the context of human production. This article proposes a way to investigate how a neural network learns this relation for a single speaker while performing a recognition task using attention mechanisms. Further, an analysis of the learnt dynamics is utilized to establish the relationship between the two modalities and extract automatic segments. For the purpose of this study, a new dataset has been recorded for French CS. Along with the release of this dataset, a benchmark will be reported for word-level recognition, a novelty in the automatic recognition of French CS.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
M3ST: Mix at Three Levels for Speech Translation
Cheng, Xuxin, Dong, Qianqian, Yue, Fengpeng, Ko, Tom, Wang, Mingxuan, Zou, Yuexian
How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's well known that data augmentation is an efficient method to improve performance for many tasks by enlarging the dataset. In this paper, we propose Mix at three levels for Speech Translation (M^3ST) method to increase the diversity of the augmented training corpus. Specifically, we conduct two phases of fine-tuning based on a pre-trained model using external machine translation (MT) data. In the first stage of fine-tuning, we mix the training corpus at three levels, including word level, sentence level and frame level, and fine-tune the entire model with mixed data. At the second stage of fine-tuning, we take both original speech sequences and original text sequences in parallel into the model to fine-tune the network, and use Jensen-Shannon divergence to regularize their outputs. Experiments on MuST-C speech translation benchmark and analysis show that M^3ST outperforms current strong baselines and achieves state-of-the-art results on eight directions with an average BLEU of 29.9.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts
Shashirekha, H. L., Balouchzahi, F., Anusha, M. D., Sidorov, G.
The task of automatically identifying a language used in a given text is called Language Identification (LI). India is a multilingual country and many Indians especially youths are comfortable with Hindi and English, in addition to their local languages. Hence, they often use more than one language to post their comments on social media. Texts containing more than one language are called "code-mixed texts" and are a good source of input for LI. Languages in these texts may be mixed at sentence level, word level or even at sub-word level. LI at word level is a sequence labeling problem where each and every word in a sentence is tagged with one of the languages in the predefined set of languages. In order to address word level LI in code-mixed Kannada-English (Kn-En) texts, this work presents i) the construction of code-mixed Kn-En dataset called CoLI-Kenglish dataset, ii) code-mixed Kn-En embedding and iii) learning models using Machine Learning (ML), Deep Learning (DL) and Transfer Learning (TL) approaches. Code-mixed Kn-En texts are extracted from Kannada YouTube video comments to construct CoLI-Kenglish dataset and code-mixed Kn-En embedding. The words in CoLI-Kenglish dataset are grouped into six major categories, namely, "Kannada", "English", "Mixed-language", "Name", "Location" and "Other". The learning models, namely, CoLI-vectors and CoLI-ngrams based on ML, CoLI-BiLSTM based on DL and CoLI-ULMFiT based on TL approaches are built and evaluated using CoLI-Kenglish dataset. The performances of the learning models illustrated, the superiority of CoLI-ngrams model, compared to other models with a macro average F1-score of 0.64. However, the results of all the learning models were quite competitive with each other.
- North America > Mexico (0.14)
- Europe > Netherlands (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)